-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] LocalProcessProxy Kernels: user impersonate & working_dir #971
base: main
Are you sure you want to change the base?
Conversation
if username: | ||
uid=getpwnam(username).pw_uid | ||
guid=grp.getgrnam(username).gr_gid | ||
os.chown(self.connection_file,uid,guid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to change ownership of connection files. Because connection files are created by the gateway process (root) and when the kernel is launched as a user Permission denied
happens. Alternative approaches are welcome~
Hi @cceyda - thanks for opening this pull request. I guess I'd like to better understand your need for using local kernels in a hub environment. The primary use-case of using local kernels with EG is for parity with Kernel Gateway environments in which each user spawns their own instance of the gateway. You mention using containerized kernels so I'd like to understand why you're not leveraging either of the container solutions that use Kubernetes or Docker-related process proxy implementations. Could you elaborate a bit more about the scenario you're working within? I think you should also be setting the Regarding the changes, I think it's great that working-dir is getting applied to the LocalProcessProxy. However, regarding the setuid, I'd rather not introduce a dependency on jupyterhub, so we'd need a different means of doing that - of which I'm sure there are many. First, though, I want to make sure using the LocalProcessProxy is the right approach for this, especially if these kernels will be running in their own containers. Thank you. |
I would rather avoid each user having to spin their own kernel gateway. When Jupyterhub & gateway is run by an admin (let's say root): This also gives users the flexibility to choose between using dockerized kernels or conda kernels or spark or just the local kernel. I think conda kernels also get launched using LocalProcessProxy.
Honestly for small setups (with 1-2servers 3-6 users) managing Kubernetes is not really easy (or necessary).
yes~ partially my reason for naming it user impersonate. I think we should do this impersonation only if EG_IMPERSONATION_ENABLED=True. I think user authorization check is already done by
Yep~ That was my lazy way of testing if this implementation worked. I will replace it with something standalone later. |
@@ -571,8 +573,17 @@ def write_connection_file(self): | |||
self.hb_port = ports[3] | |||
self.control_port = ports[4] | |||
|
|||
return super(RemoteKernelManager, self).write_connection_file() | |||
cf = super(RemoteKernelManager, self).write_connection_file() | |||
username=self.user_overrides.get('KERNEL_USERNAME',None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to get the username at this point? like:
username = KernelSessionManager.get_kernel_username(**kwargs)
It would be the best to write_connection_file to the correct user rather than chown-ing it later. But idk https://github.com/jupyter/jupyter_client/blob/98faaf91d00b4fbf17b2fbf0b08b1fd23adc27bf/jupyter_client/connect.py#L42
user=kwargs["env"].get("KERNEL_USERNAME",None) | ||
self.log.info(f"KARGS {kwargs}") | ||
|
||
if user: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can check here if EG_IMPERSONATION_ENABLED & user. (The current EG behavior ignores the KERNEL_USERNAME)
btw,I will also remove unnecessary logging, it was to help me understand how EG works.
@@ -147,7 +149,7 @@ async def start_kernel(self, *args, **kwargs): | |||
username = KernelSessionManager.get_kernel_username(**kwargs) | |||
self.log.debug("RemoteMappingKernelManager.start_kernel: {kernel_name}, kernel_username: {username}". | |||
format(kernel_name=kwargs['kernel_name'], username=username)) | |||
|
|||
self.connection_dir=os.path.join(os.path.expanduser('~'+username),".local/share/jupyter/runtime/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.connection_dir is normally determined by JUPYTER_RUNTIME_DIR but if this is not in user Paths in which to search for connection files it breaks so I did another hacky update. https://github.com/jupyter/jupyter_client/blob/98faaf91d00b4fbf17b2fbf0b08b1fd23adc27bf/jupyter_client/connect.py#L191
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to call jupyter_core.paths.get_home_dir()
to, presumably get /root
, then replace that portion of jupyter_runtime_dir()
with the target user's home directory? Otherwise, this isn't producing the correct path relative to the platform.
We should also ensure the user's home directory exists, since KERNEL_USERNAME is not necessarily reflective of a user. (I suppose setuid will enforce that constraint though.)
How will these kinds of files be cleaned up?
self.connection_dir
is on the multi kernel manager which spans all kernel manager instances - so this isn't going to work. I'm not sure the connection information needs to be isolated on a per user level. It isn't today at least.
Probably best just to replicate the conversation - thank you for your response.
OK - fair enough, Just be aware that in this case, the EG server will require the total resources consumed by each active (and local) kernel across all users.
Keep in mind that unless EG is running in a matching containerized environment, docker or k8s kernels cannot be launched.
Correct, conda kernels essentially fabricate the kernelspec so, in that case, they can only be local kernels.
I see. I misunderstood this comment from your description: So I can launch containers/kernels on behalf of users.
Correct. The impersonation that exists today is really only used by YARN/Spark, but this capability at the local level seems useful, and, yes, we should only perform setuid if enabled. I also think we should require that the gateway user be unauthorized from launching local kernels.
It is also configured at a global level and "unauthorized" users take precedence no matter where they are defined, whereas authorized users at the kernel level can override globally configured authorized users.
Right on. Thanks. |
Thanks for the diagram. Got a few questions, observations:
|
I'm using a custom ProcessProxy for launching docker containers (custom images), which also adds some logic about volumes to mount.
I'm using a custom spawner with jupyerhub, it takes the server address as input. If
Yes, in my case it is okay if resources are shared (it ends up that way anyway, whether we use jupyter or not). Also containerization is mostly for the sake of reproducibility/convenience and not security isolation.
I prefer using; one EG per machine + one JupyterLab(per user per machine) because it is easier than sharing the files between machines. So DockerProcessProxy logic has been enough. Also didn't bother setting docker swarm since I heard it is being superseded by Kubernetes. (Idealistically I want to use kubernetes, but don't have the energy to set everything up😅 ) |
Nice. Are you thinking about contributing that back to the project at all?
I thought it was a virtual network thing - but that might just be kubernetes and swarm and the docker process proxy just followed suit. Since they'll (DPP) always be on the same host, I could see that working although, in the big picture of EG, we want kernels off-host. If you could address the build issues ( |
I keep meaning to do that but my timeline is chaotic. I thought better to start with this smaller PR and work up to that one.
I will mark this as WIP, address the issues you mentioned later |
If running kernel gateway as root, launching local python kernels launches them as root!
Related issue: #789
Please provide feedback~
What is a local kernel?
Example:
Why run gateway as root?
So I can launch containers/kernels on behalf of users.
My setup:
JupyterHub (managed by root) -> Jupyterlab (user jupyterlab instance spawned by jupyterhub) -> Gateway Kernels (managed by root)
Style check code
Better error messages